16 research outputs found
BioHackathon series in 2011 and 2012: penetration of ontology and linked data in life science domains
The application of semantic technologies to the integration of biological data and the interoperability of bioinformatics analysis and visualization tools has been the common theme of a series of annual BioHackathons hosted in Japan for the past five years. Here we provide a review of the activities and outcomes from the BioHackathons held in 2011 in Kyoto and 2012 in Toyama. In order to efficiently implement semantic technologies in the life sciences, participants formed various sub-groups and worked on the following topics: Resource Description Framework (RDF) models for specific domains, text mining of the literature, ontology development, essential metadata for biological databases, platforms to enable efficient Semantic Web technology development and interoperability, and the development of applications for Semantic Web data. In this review, we briefly introduce the themes covered by these sub-groups. The observations made, conclusions drawn, and software development projects that emerged from these activities are discussed
SugarDrawer: A Web-Based Database Search Tool with Editing Glycan Structures
In life science fields, database integration is progressing and contributing to collaboration between different research fields, including the glycosciences. The integration of glycan databases has greatly progressed collaboration worldwide with the development of the international glycan structure repository, GlyTouCan. This trend has increased the need for a tool by which researchers in various fields can easily search glycan structures from integrated databases. We have developed a web-based glycan structure search tool, SugarDrawer, which supports the depiction of glycans including ambiguity, such as glycan fragments which contain underdetermined linkages, and a database search for glycans drawn on the canvas. This tool provides an easy editing feature for various glycan structures in just a few steps using template structures and pop-up windows which allow users to select specific information for each structure element. This tool has a unique feature for selecting possible attachment sites, which is defined in the Symbol Nomenclature for Glycans (SNFG). In addition, this tool can input and output glycans in WURCS and GlycoCT formats, which are the most commonly-used text formats for glycan structures
The glycoconjugate ontology (GlycoCoO) for standardizing the annotation of glycoconjugate data and its application
Recent years have seen great advances in the development of glycoproteomics protocols and methods resulting in a sustainable increase in the reporting proteins, their attached glycans and glycosylation sites. However, only very few of these reports find their way into databases or data repositories. One of the major reasons is the absence of digital standard to represent glycoproteins and the challenging annotations with glycans. Depending on the experimental method, such a standard must be able to represent glycans as complete structures or as compositions, store not just single glycans but also represent glycoforms on a specific glycosylation side, deal with partially missing site information if no site mapping was performed, and store abundances or ratios of glycans within a glycoform of a specific site. To support the above, we have developed the GlycoConjugate Ontology (GlycoCoO) as a standard semantic framework to describe and represent glycoproteomics data. GlycoCoO can be used to represent glycoproteomics data in triplestores and can serve as a basis for data exchange formats. The ontology, database providers and supporting documentation are available online (https://github.com/glycoinfo/GlycoCoO).</p
GlycoRDF : an ontology to standardize glycomics data in RDF
Motivation: Over the last decades several glycomics-based bioinformatics resources and databases have been created and released to the public. Unfortunately, there is no common standard in the representation of the stored information or a common machine-readable interface allowing bioinformatics groups to easily extract and cross-reference the stored information. Results: An international group of bioinformatics experts in the field of glycomics have worked together to create a standard Resource Description Framework (RDF) representation for glycomics data, focused on glycan sequences and related biological source, publications and experimental data. This RDF standard is defined by the GlycoRDF ontology and will be used by database providers to generate common machine-readable exports of the data stored in their databases. Availability and implementation: The ontology, supporting documentation and source code used by database providers to generate standardized RDF are available online (http://www.glycoinfo.org/GlycoRDF/). Contact: [email protected] or [email protected] Supplementary information: Supplementary data are available at Bioinformatics online.7 page(s
Latest developments in Semantic Web technologies applied to the glycosciences
The Integrated Life Science Database Project of Japan funded a group of glycoscientists to carry out a project to integrate glycoscience databases using Semantic Web technologies. As a continuation of the previous project period, the Japan Consortium for Glycobiology and Glycotechnology Database (JCGGDB) developed several glycoscience-related databases. The GlycoProtDB database is among those being integrated, providing an important resource to understand protein glycosylation. Another database being integrated is GlycoEpitope, a comprehensive database of carbohydrate epitopes and antibodies. In the current project period, we started the development of GlyTouCan, the international glycan structure repository providing unique accession numbers to all glycan structures. Although such databases are sufficiently important in and of themselves, their integration with otherâomics data such as the protein information in UniProt will be crucial to bring glycosciences to the forefront of life sciences. However, to integrate such disparate sets of data among different fields in a way such that future maintenance costs are minimal, standardized ontologies and formats must be established. Our latest project has attempted to define the minimal standards that are necessary to enable this integration. The technical challenges to integrate all these databases and the technologies to overcome these challenges will be described
Introducing glycomics data into the Semantic Web
Background: Glycoscience is a research field focusing on complex carbohydrates (otherwise known as glycans)a, which can, for example, serve as âswitchesâ that toggle between different functions of a glycoprotein or glycolipid. Due to the advancement of glycomics technologies that are used to characterize glycan structures, many glycomics databases are now publicly available and provide useful information for glycoscience research. However, these databases have almost no link to other life science databases. Results: In order to implement support for the Semantic Web most efficiently for glycomics research, the developers of major glycomics databases agreed on a minimal standard for representing glycan structure and annotation information using RDF (Resource Description Framework). Moreover, all of the participants implemented this standard prototype and generated preliminary RDF versions of their data. To test the utility of the converted data, all of the data sets were uploaded into a Virtuoso triple store, and several SPARQL queries were tested as âproofs-of-conceptâ to illustrate the utility of the Semantic Web in querying across databases which were originally difficult to implement. Conclusions: We were able to successfully retrieve information by linking UniCarbKB, GlycomeDB and JCGGDB in a single SPARQL query to obtain our target information. We also tested queries linking UniProt with GlycoEpitope as well as lectin data with GlycomeDB through PDB. As a result, we have been able to link proteomics data with glycomics data through the implementation of Semantic Web technologies, allowing for more flexible queries across these domains.7 page(s
WURCS: The Web3 Unique Representation of Carbohydrate Structures
In
recent years, the Semantic Web has become the focus of life science
database development as a means to link life science data in an effective
and efficient manner. In order for carbohydrate data to be applied
to this new technology, there are two requirements for carbohydrate
data representations: (1) a linear notation which can be used as a
URI (Uniform Resource Identifier) if needed and (2) a unique notation
such that any published glycan structure can be represented distinctively.
This latter requirement includes the possible representation of nonstandard
monosaccharide units as a part of the glycan structure, as well as
compositions, repeating units, and ambiguous structures where linkages/linkage
positions are unidentified. Therefore, we have developed the Web3
Unique Representation of Carbohydrate Structures (WURCS) as a new
linear notation for representing carbohydrates for the Semantic Web
Recommended from our members
GlyTouCan 1.0 â The international glycan structure repository
Glycans are known as the third major class of biopolymers, next to DNA and proteins. They cover the surfaces of many cells, serving as the âfaceâ of cells, whereby other biomolecules and viruses interact. The structure of glycans, however, differs greatly from DNA and proteins in that they are branched, as opposed to linear sequences of amino acids or nucleotides. Therefore, the storage of glycan information in databases, let alone their curation, has been a difficult problem. This has caused many duplicated efforts when integration is attempted between different databases, making an international repository for glycan structures, where unique accession numbers are assigned to every identified glycan structure, necessary. As such, an international team of developers and glycobiologists have collaborated to develop this repository, called GlyTouCan and is available at http://glytoucan.org/, to provide a centralized resource for depositing glycan structures, compositions and topologies, and to retrieve accession numbers for each of these registered entries. This will thus enable researchers to reference glycan structures simply by accession number, as opposed to by chemical structure, which has been a burden to integrate glycomics databases in the past